Analysis of Zillow Home Value Index across Time and Location

Gaurav Rimal, Mike Lupo, Sean Moadel

Introduction:

According to the Federal Reserve, the housing market has tightened considerably. The supply of homes available for sale has fallen to historically low levels and home price growth has increased greatly during the pandemic. In this tutorial, we aim to analyze how home values have changed in the 21st century across the United States. We will be measuring the changes in Zillow Home Value Index (ZHVI) which is a seasonally adjusted measure of typical home value and market changes.

ZHVI is a measure provided by Zillow Inc. that measures two key variables in the current housing market as well as over time. These two variables are home value and housing market appreciation. For more details about ZHVI and how it is calculated, visit this link. Another link that may be helpful is the ZHVI User Guide.

Using the ZHVI dataset, our goal is to test whether the size of the region affects its ZHVI. In order to do this, we will be going through the data science lifecycle which has five parts:

  1. Data Collection
  2. Data Processing
  3. Exploratory Analysis and Data Visualization
  4. Analysis, Hypothesis Testing, and Machine Learning
  5. Insight and Policy Decision

Table of Contents

Data Collection

This data was obtained from https://www.zillow.com/research/data/ under Home Values.

First, let's import some libraries that will be using:

If you do not have any of these libraries installed, you can install them by entering # $ pip3 install [package]. If you need any more information, you can see the documentation or a tutorial for each library listed below:

Data Processing

Now, let's clean the data so we can extract only the relevant details such as Dates, Locations and the ZHVI.

The data now looks clean but there could be some missing values. Let's check!

In order to impute the missing values, we are going to calculate the average ZHVI for the whole state on that date.

There are still some missing values. Let's fill those with average ZHVI for the whole region across our window of time.

Let's just calculate the average ZHVI for states as well.

Finally, the data is ready to be disected. Lets move on to anaylsis.

Exploratory Analysis & Data Visualization

Here is the how ZHVI has changed for every state across time:

We can see that states like New York and Hawaii have had premium ZHVI for almost all the time. ZHVI seems to peak at times of crisis like the 2008 Recession and the 2019 Covid Pandemic.

Now, lets look at ZHVIs for all of the years:

The average ZHVI for the Year has a net increase from 2000 to 2020. The spread of ZHVI is also getting much bigger than we have seen in the earlier part of the decade.

Lets look at average ZHVI per region size:

Smaller regions can also have huge ZHVI and larger ones can have smaller ones. They don't seem to have a connection.

Have a look at Average ZHVI across all the States.

Hawaii, DC, California and Massachusetts have some of the most valuable homes.

Analysis, Hypothesis Testing, & ML

The next step is to create a model for the data.

Interpretation: Insight & Policy Decision

The graph above shows that the model clearly does not fit the data, and the data shows that region size and Zillow Home Value Index do not seem to be correlated. The model performance statistics give additional insight that there is not a relationship between the region size and the ZHVI. This is due to the very low R-squared value of 0.084. Therefore not much of the movement of the ZHVI is explained by the region size. Factors like economic crises may be more impactful to the ZHVI than region size. For example, the economic downturn of 2008 impacted home values and is reflected in the ZVHI vs. Time graph above.